PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
19811046
19821864
19831175
19841186
19851298
19869107
198711118
198825143
198946189
199052241
199156297
199266363
1993232595
19944611,056
19953431,399
19964071,806
19975612,367
19987563,123
19998964,019
200010045,023
200110436,066
200211127,178
200315588,736
2004211810,854
2005233813,192
2006264515,837
2007296618,803
2008275921,562
2009281724,379
2010287027,249
2011263829,887
2012288832,775
2013309635,871
2014379939,670
2015314442,814
2016372246,536
2017399850,534
2018373054,264
2019405158,315
2020500363,318
2021450667,824
2022548073,304
2023515078,454
2024543683,890
2025619190,081
2026136091,441